124 research outputs found
Multi-omics integration accurately predicts cellular state in unexplored conditions for Escherichia coli.
A significant obstacle in training predictive cell models is the lack of integrated data sources. We develop semi-supervised normalization pipelines and perform experimental characterization (growth, transcriptional, proteome) to create Ecomics, a consistent, quality-controlled multi-omics compendium for Escherichia coli with cohesive meta-data information. We then use this resource to train a multi-scale model that integrates four omics layers to predict genome-wide concentrations and growth dynamics. The genetic and environmental ontology reconstructed from the omics data is substantially different and complementary to the genetic and chemical ontologies. The integration of different layers confers an incremental increase in the prediction performance, as does the information about the known gene regulatory and protein-protein interactions. The predictive performance of the model ranges from 0.54 to 0.87 for the various omics layers, which far exceeds various baselines. This work provides an integrative framework of omics-driven predictive modelling that is broadly applicable to guide biological discovery
KGLM: Integrating Knowledge Graph Structure in Language Models for Link Prediction
The ability of knowledge graphs to represent complex relationships at scale
has led to their adoption for various needs including knowledge representation,
question-answering, fraud detection, and recommendation systems. Knowledge
graphs are often incomplete in the information they represent, necessitating
the need for knowledge graph completion tasks, such as link and relation
prediction. Pre-trained and fine-tuned language models have shown promise in
these tasks although these models ignore the intrinsic information encoded in
the knowledge graph, namely the entity and relation types. In this work, we
propose the Knowledge Graph Language Model (KGLM) architecture, where we
introduce a new entity/relation embedding layer that learns to differentiate
distinctive entity and relation types, therefore allowing the model to learn
the structure of the knowledge graph. In this work, we show that further
pre-training the language models with this additional embedding layer using the
triples extracted from the knowledge graph, followed by the standard
fine-tuning phase sets a new state-of-the-art performance for the link
prediction task on the benchmark datasets
A systems biology analysis of brain microvascular endothelial cell lipotoxicity.
BackgroundNeurovascular inflammation is associated with a number of neurological diseases including vascular dementia and Alzheimer's disease, which are increasingly important causes of morbidity and mortality around the world. Lipotoxicity is a metabolic disorder that results from accumulation of lipids, particularly fatty acids, in non-adipose tissue leading to cellular dysfunction, lipid droplet formation, and cell death.ResultsOur studies indicate for the first time that the neurovascular circulation also can manifest lipotoxicity, which could have major effects on cognitive function. The penetration of integrative systems biology approaches is limited in this area of research, which reduces our capacity to gain an objective insight into the signal transduction and regulation dynamics at a systems level. To address this question, we treated human microvascular endothelial cells with triglyceride-rich lipoprotein (TGRL) lipolysis products and then we used genome-wide transcriptional profiling to obtain transcript abundances over four conditions. We then identified regulatory genes and their targets that have been differentially expressed through analysis of the datasets with various statistical methods. We created a functional gene network by exploiting co-expression observations through a guilt-by-association assumption. Concomitantly, we used various network inference algorithms to identify putative regulatory interactions and we integrated all predictions to construct a consensus gene regulatory network that is TGRL lipolysis product specific.ConclusionSystem biology analysis has led to the validation of putative lipid-related targets and the discovery of several genes that may be implicated in lipotoxic-related brain microvascular endothelial cell responses. Here, we report that activating transcription factors 3 (ATF3) is a principal regulator of TGRL lipolysis products-induced gene expression in human brain microvascular endothelial cell
A synthetic biology approach to self-regulatory recombinant protein production in Escherichia coli
Background:
Recombinant protein production is a process of great industrial interest, with products that range from pharmaceuticals to biofuels. Since high level production of recombinant protein imposes significant stress in the host organism, several methods have been developed over the years to optimize protein production. So far, these trial-and-error techniques have proved laborious and sensitive to process parameters, while there has been no attempt to address the problem by applying Synthetic Biology principles and methods, such as integration of standardized parts in novel synthetic circuits.
Results:
We present a novel self-regulatory protein production system that couples the control of recombinant protein production with a stress-induced, negative feedback mechanism. The synthetic circuit allows the down-regulation of recombinant protein expression through a stress-induced promoter. We used E. coli as the host organism, since it is widely used in recombinant processes. Our results show that the introduction of the self-regulatory circuit increases the soluble/insoluble ratio of recombinant protein at the expense of total protein yield. To further elucidate the dynamics of the system, we developed a computational model that is in agreement with the observed experimental data, and provides insight on the interplay between protein solubility and yield.
Conclusion:
Our work introduces the idea of a self-regulatory circuit for recombinant protein products, and paves the way for processes with reduced external control or monitoring needs. It demonstrates that the library of standard biological parts serves as a valuable resource for initial synthetic blocks that needs to be further refined to be successfully applied in practical problems of biotechnological significance. ^Finally, the development of a predictive model in conjunction with experimental validation facilitates a better understanding of the underlying dynamics and can be used as a guide to experimental design.(VLID)90663
An integrative, multi-scale, genome-wide model reveals the phenotypic landscape of Escherichia coli.
Given the vast behavioral repertoire and biological complexity of even the simplest organisms, accurately predicting phenotypes in novel environments and unveiling their biological organization is a challenging endeavor. Here, we present an integrative modeling methodology that unifies under a common framework the various biological processes and their interactions across multiple layers. We trained this methodology on an extensive normalized compendium for the gram-negative bacterium Escherichia coli, which incorporates gene expression data for genetic and environmental perturbations, transcriptional regulation, signal transduction, and metabolic pathways, as well as growth measurements. Comparison with measured growth and high-throughput data demonstrates the enhanced ability of the integrative model to predict phenotypic outcomes in various environmental and genetic conditions, even in cases where their underlying functions are under-represented in the training set. This work paves the way toward integrative techniques that extract knowledge from a variety of biological data to achieve more than the sum of their parts in the context of prediction, analysis, and redesign of biological systems
Recommended from our members
The Computational Diet: A Review of Computational Methods Across Diet, Microbiome, and Health.
Food and human health are inextricably linked. As such, revolutionary impacts on health have been derived from advances in the production and distribution of food relating to food safety and fortification with micronutrients. During the past two decades, it has become apparent that the human microbiome has the potential to modulate health, including in ways that may be related to diet and the composition of specific foods. Despite the excitement and potential surrounding this area, the complexity of the gut microbiome, the chemical composition of food, and their interplay in situ remains a daunting task to fully understand. However, recent advances in high-throughput sequencing, metabolomics profiling, compositional analysis of food, and the emergence of electronic health records provide new sources of data that can contribute to addressing this challenge. Computational science will play an essential role in this effort as it will provide the foundation to integrate these data layers and derive insights capable of revealing and understanding the complex interactions between diet, gut microbiome, and health. Here, we review the current knowledge on diet-health-gut microbiota, relevant data sources, bioinformatics tools, machine learning capabilities, as well as the intellectual property and legislative regulatory landscape. We provide guidance on employing machine learning and data analytics, identify gaps in current methods, and describe new scenarios to be unlocked in the next few years in the context of current knowledge
Anomalous QBO influence in the long period Kelvin waves in the low latitude mesosphere and lower thermosphere region over Kolhapur (16.7N, 74.2E)
15th MST Radar WorkshopSession M6: Middle atmosphere dynamics and structureMay 31 (Wed), NIPR Auditoriu
Recommended from our members
Nutrient Estimation from 24-Hour Food Recalls Using Machine Learning and Database Mapping: A Case Study with Lactose.
The Automated Self-Administered 24-Hour Dietary Assessment Tool (ASA24) is a free dietary recall system that outputs fewer nutrients than the Nutrition Data System for Research (NDSR). NDSR uses the Nutrition Coordinating Center (NCC) Food and Nutrient Database, both of which require a license. Manual lookup of ASA24 foods into NDSR is time-consuming but currently the only way to acquire NCC-exclusive nutrients. Using lactose as an example, we evaluated machine learning and database matching methods to estimate this NCC-exclusive nutrient from ASA24 reports. ASA24-reported foods were manually looked up into NDSR to obtain lactose estimates and split into training (n = 378) and test (n = 189) datasets. Nine machine learning models were developed to predict lactose from the nutrients common between ASA24 and the NCC database. Database matching algorithms were developed to match NCC foods to an ASA24 food using only nutrients ("Nutrient-Only") or the nutrient and food descriptions ("Nutrient + Text"). For both methods, the lactose values were compared to the manual curation. Among machine learning models, the XGB-Regressor model performed best on held-out test data (R2 = 0.33). For the database matching method, Nutrient + Text matching yielded the best lactose estimates (R2 = 0.76), a vast improvement over the status quo of no estimate. These results suggest that computational methods can successfully estimate an NCC-exclusive nutrient for foods reported in ASA24
- …